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Abstract. We study a discrete model of repelling particles, and we show using 
linear prograimning bounds that many familiar families of error-correcting 
codes minimize a broad class of potential energies when compared with all 
other codes of the same size and block length. Examples of these universally 
optimal codes include Hamming, Golay, and Reed-Solomon codes, among many 
others, and this helps explain their robustness as the channel model varies. 
Universal optimality of these codes is equivalent to minimality of their binomial 
moments, which has been proved in many cases by Ashikhmin and Barg. We 
highlight connections with mathematical physics and the analogy between 
these results and previous work by Cohn and Kumar in the continuous setting, 
and we develop a framework for optimizing the linear programming bounds. 
Furthermore, we show that if these bounds prove a code is universally optimal, 
then the code remains universally optimal even if one codeword is removed. 



1. Introduction 

Geometry has long played a key role in coding theory, starting with the work 
of Hamming [H50]: binary codes can be viewed as packings of Hamming balls in a 
discrete cube. This framework provides a powerful analogy between discrete and 
continuous packing problems, which has been extensively developed [CS99] and 
remains an active research topic. In this paper, we extend the analogy to a much 
broader relationship between coding theory and discrete models of physics. Of course, 
physics is related to coding theory in many ways, ranging from connections between 
spin glasses and codes [S89, S94] to the statistical physics of belief propagation and 
other applications of graphical models to coding theory (sec [MU07] for a survey) . 
Applications of physics to coding theory typically focus on the limit as the block 
length tends to infinity. Instead, in this paper we show that certain classical codes 
are exact ground states of natural physics models. 

In addition to extending the analogy with continuous packing problems, our 
results can be thought of as addressing a philosophical problem. Many classical 
codes — such as Hamming, Golay, or Reed-Solomon codes — remain very popular, 
despite the many other good codes that have been found. Why should this be? One 
obvious answer is that these codes are particularly beautiful and useful, especially 
given the simplicity of their constructions. Another is that they were discovered 
early in the development of coding theory and had a chance to cement their place 
in the canon. We propose a third explanation: a code is most useful if it is robust, 
in the sense that it optimizes not just one specific measure of quality, but rather a 
wide range of them simultaneously. We will prove that these classical codes have a 
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rare form of robustness that we call universal optimality, based on an analogy with 
continuous optimization problems [CK07] . 

As we will explain after Lemma 3.2, a code is universally optimal if and only if 
all the binomial moments of its distance distribution are minimal. This problem has 
been studied by Ashikhmin and Barg [AB99] , with a very different combinatorial 
motivation (namely, counting pairs of codewords in subcodes with restricted support), 
and they proved that a number of codes have this property, including the cases 
mentioned above. Thus, universal optimality is not a new property. However, the 
physics motivation appears to be new, and we provide a conceptual framework for 
proving results such as universal optimality of the binary Golay code, whose only 
previous proof involved computer solution of nearly two dozen linear programs. We 
also prove strong structural results about these codes, including our most surprising 
result: if the linear programming bounds prove a code is universally optimal, then 
it remains universally optimal if any single codeword is removed. 

Let ¥q denote an alphabet with q elements, and let \x — y\ denote the Hamming 
distance between words x,y e F". Of course, this notation suggests that Fg is a 
finite field, but we will make no use of the field structure. However, we pick a 
distinguished element e F^ and write \x\ for |x — 0|. 

We view F^ as a discrete model of the universe, and we envision a code C C F^ 
as specifying the locations of some particles. We wish to separate these particles 
as much as we can, and one natural way to do so is to let them repel each other. 
We will choose a pairwise potential function between the particles, and then we will 
study the ground states of this system, i.e., the particle arrangements that minimize 
the total energy. Different potential functions will obviously yield different ground 
states in general, but we will treat many different potentials on an equal footing. 

Given a code C C F^ and a function / : { 1 , . . . , n} — )■ R, the potential energy of C 
with respect to the potential function f is defined to be 

^/(^) = ^ E fi\^-y\)- 

' ' x,vec 

The normalization factor of 1/|C| is not essential, because we will always compare 
codes of the same size, but it will prove convenient later. 

To have any hope of understanding the ground states, we must restrict the 
class of potential functions under consideration. They should be decreasing, so the 
corresponding forces are repulsive, and we wish the repulsion to grow stronger as the 
points grow closer together. The completely monotonic functions extend these two 
properties in a particularly compelling way, and they have found many apphcations 
in physics. Let A be the finite difference operator, defined by 

A/(n) = /(n+l)-/(n). 

A function / : {a, a+ 1, . . . , 6} ^ R is completely monotonic if its iterated differences 
alternate in sign via (_l)fcA'=/ > 0; more precisely, (— l)''A'^/(i) > whenever 

fc > and a < i < b — k. 

Definition 1.1. A code C C F^ is universally optimal if 

Ef{C) < Ef{C') 

for every code C C F^ with \C'\ = \C\ and every completely monotonic function 
/: {l,...,n}^M. 
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The inverse power laws /(r) = r~" with a > are completely monotonic: 
their derivatives obviously alternate in sign (i.e., they are completely monotonic 
as functions of a continuous variable) , and then the mean value theorem implies 
that the same holds for finite differences. These potential functions arise naturally 
because they describe path loss for electromagnetic waves in differing environments. 
If d is the minimal distance between codewords in C and k is the number of pairs of 
codewords at that distance, then as a — >■ oo, 

A universally optimal code must therefore be an optimal code from the traditional 
Hamming perspective, because minimizing Ef{C) requires maximizing d when \C\ 
is fixed. However, we will see that universal optimality is a far stronger condition 
than ordinary optimality, since all pairwise distances are taken into account when 
computing the potential energy. 

The definition of universal optimality is analogous to that of Cohn and Kumar 
[CK07] in the continuous setting. They studied particle arrangements in spheres 
or projective spaces and showed that many beautiful configurations arc universally 
optimal, including the icosahedron, the Es root system, and the minimal vectors in 
the Leech lattice. More generally, universal optimality plays a role in explaining 
the occurrence of certain remarkable symmetry groups in discrete mathematics and 
physics [CIO]. 

For an alphabet of size 2, our definition nearly reduces to that of Cohn and 

Kumar if wc embed F2 as the vertices of a cube in the sphere S'"~^ and confine 
the particles to these points. (The only difference is that our definition requires 
optimality for a slightly larger class of potential functions.) For larger alphabets, 
wc know of no such embedding, and even when q = 2, this restriction qualitatively 
changes the results from those in [CK07]. 

Bouman, Draisma, and van Leeuwaarden [BDL12] have independently studied 
energy minimization models on toric grids Tinder the Lcc metric. Their main theorem 
implies universal optimality for certain checkerboard arrangements of particles filling 
half of the grid, but they do not investigate other codes. 

Universally optimal codes have robust energy minimization properties, which 
translate into good performance according to a broad range of measures. For 
example, consider the probability of an undetected error under the q-aiy symmetric 
channel with error probability p for each symbol. For a code C of block length n, if 
we average over all codewords, then the probability of an undetected error equals 
Ef{C), where 

/w = ('-''r-'(^)'-('-fr((i^i^^ 

This function is completely monotonic as long as the base of the exponential is at 
most 1, i.e., p < (q — I)/?, and this condition is natural because it simply says 
each symbol is more likely to remain the same than to become any other fixed 
symbol. Thus, when p < {q — l)/q, a universally optimal code will always minimize 
the chances of an undetected error. This application is known in the context of 
binomial moments (see Section V of [AB99]), and indeed binomial moments play an 
important role in the general theory of error-detecting codes [K07]. 
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For a slightly more subtle application, consider maximum-likelihood decoding 
for a binary-input discrete memoryless channel. The exact error probability for 
decoding is subtle, but for relatively low-rate codes it is frequently estimated using 
a union bound (see Theorem 7.5 in [M04, p. 153]). This bound shows that the error 
probability for a random codeword from a code C is at most Ef{C) with f(r) = 7'', 
where 7 is the Bhattacharyya parameter. Specifically, if the channel has output 
alphabet A and transition probabilities p{x\y), then 

7 = ^ ^p{x\Q)p{x\l). 

Because 7 < 1 by the Cauchy-Schwarz inequality, the potential function / is 
completely monotonic, and thus a universally optimal code must minimize this 
upper bound for the decoding error. Wc do not know whether such a code necessarily 
minimizes the true decoding error, but minimizing a useful upper bound is nearly 
as good. 

Optimality is by no means limited to this particular union bound. For example, 
the same holds true for the AWGN channel with antipodal signaling. (Verifying 
complete monotonicity for the potential function requires a brief inductive proof, 
but it is not difficult.) This explains the observations of Ferrari and Chugg [FC03], 
who used linear programming bounds to verify that certain Hamming and Golay 
codes minimize this bound for a wide range of signal-to- noise ratios. Our results 
prove that this always works and show how to generalize it to other codes. 

We will prove that all the codes listed in Table 1 of Section 7 are universally 
optimal. They include Hamming, simplex, Hadamard, conference, Golay, MDS, 
ovoid, and Nordstrom-Robinson codes and some of their variations, such as extended, 
shortened, and punctured codes. For the Hamming, Hadamard, Golay, MDS, and 
Nordstrom-Robinson codes, universal optimality is a theorem of Ashikhmin and 
Barg [AB99], as mentioned above. 

Universally optimal codes are common for short block lengths. For example, for 
binary codes it is not hard to complete a brute force search among all codes of 
block length at most 5. Up to isomorphism (i.e., translation and permutation of the 
coordinates), there is a unique universally optimal binary code of size N and block 
length n whenever n < 4 and 1 < A'' < 2". For n = 5, such a code exists if and only 
if A^ ^ {9, 12, 13, 14, 18, 19, 20, 23}, and it is unique except when A = 5 or A = 27, 
in which case there are two isomorphism classes (see Lemma 9.1 for an explanation 
of the A -H- 32 — A symmetry). Thus, a universal optimum needn't exist or be 
unique if it exists. The abundance of universal optima in low dimensions is likely 
a special feature of the low- dimensional Hamming cube, and existence seems to 
become increasingly rare as the block length increases. 

Our main technical tool for bounding energy is the linear program developed 
by Delsarte [D72], which was originally used to bound the size of codes given 
their minimum distance and whose continuous analogue was applied to energy 
minimization by Yudin [Y92]. We will call a code LP universally optimal if its 
universal optimality follows from these bounds, as occurs for all the cases in Table 1. 
Using complementary slackness conditions on the linear program, we provide various 
sufficient conditions for LP universal optimality (e.g.. Proposition 6.2). One of 
our key results is that LP universal optimality behaves well under duality, thereby 
allowing us to apply our criteria to many classes of codes. This duality result is best 
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stated using the language of quasicodes, which are feasible vectors in the Delsarte 
linear program for distance enumerators. 

Proposition 1.2. Let a be a quasicode. Then a is universally optimal if and only 
if its dual a-"- is. 

We defer the precise definitions of qiiasicodcis and dTiality to Section 4. For linear 
codes, the notion of a quasicode dual agrees with the usual notion of the dual code. 

One result we found particularly surprising is that LP universally optimal codes 
continue to minimize energy even after we remove a single codeword. We know of 
no analogue of this property in the continuous setting. Note that it implies distance 
regularity (for each distance, every codeword has the same number of codewords at 
that distance). 

Theorem 1.3. Every LP universally optimal code is distance regular, and it remains 
universally optimal when any single codeword is removed. 

Removing a codeword yields a universal optimum, but the resulting code will 
not be LP universally optimal except in degenerate cases (namely when all the 
Hamming distances between codewords are maximal). Thus, this process cannot be 
iterated. 

In fact, there is a more general principle: for any potential function / (not 
necessarily completely monotonic), if the optimality of a code under / follows from 
the Delsarte linear program, then that code with any single codeword removed 
also minimizes / among all codes of its size. To prove these bounds, we use a 
strengthened form of Delsarte's bounds, due to Ashikhmin and Simonis [AS98]. 

This paper is organized as follows. In Section 2 we review some background 
on error-correcting codes and Krawtchouk polynomials, which form the building 
blocks of the linear program. In Section 3 we set up the linear program for proving 
universal optimality and demonstrate how to apply the complementary slackness 
conditions. In Section 4 we introduce quasicodes, which are the distance distributions 
of hypothetical codes and contain the data used in the linear program, and we prove 
Proposition 1.2. In Section 5 we provide some tools for constructing dual solutions 
to the linear program that satisfy the complementary slackness conditions laid out 
earlier. In Section 6 we combine these tools to give various criteria for the universal 
optimality based on the distance and design parameters of codes. In Section 7 we 
apply these criteria to specific families of codes and demonstrate their universal 
optimality. In Section 8 we prove Theorem 1.3. Finally, in Section 9 we conclude 
with some further questions. 

2. Background on error-correcting codes 

In this section we review some background in coding theory and Krawtchouk 
polynomials. A good reference for this material is Chapter 5 of [MS77] . This section 
contains no new definitions or results, so experts can skip to Section 3. 

A code C is simply a subset of F^. If a code C C has size A^, then we label its 
type by (n, N)q. If it is linear with dimension k, meaning that C is a fc-dimensional 
linear subspace of F^, then we label its type by [n, k]q. The subscript q is generally 
omitted if g = 2. We also write (n, N, d)g and [n, k, d]g to mean that the minimum 
distance is at least d. 
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Let Kf; denote the fc-th Krawtchouk polynomial, defined by 

i^.(.;n,,)=x:(-ir(9-i)'=-Q 

We will typically write K}.{x), because n and q will be clear from the context. The 
values of the Krawtchouk polynomials can be packaged into an (n + 1) x (n + 1) 

Krawtchouk matrix K = (-ft'i(j)))o<i j<n- 

The distance distribution of a code C C is the vector {Aq, Ai, . . . , An), where 

= 1^ \ {{x,y) e C'^ : \x - y\ = i}\ for i = 0, 1, . . . , n. 

The normalization factor of 1/|C| implies that = 1- If C is a linear code, 
then its distance distribution coincides with its weight distribution; i.e., Ai = 
\{x gC : \x\ = i}\. For linear codes, the weight distribution {Aq, . . . ,A^) of the 
dual code C""- is determined by that of C by 

n 

(2.1) Ai = -\2AKk{i). 

Let a and a"*- be column vectors containing the distance distributions of C and C""" , 
respectively. Then (2.1) can be rewritten as 

The matrix q~^/'^K is an involution; in other words, if / denotes the identity 

matrix, then 

(2.2) A'2 = 
This identity can also be written as 

n 



if z — j, and 
otherwise. 



Krawtchouk polynomials are orthogonal polynomials with respect to the binomial 
distribution Binom(n; {q — I)/?). In other words, they are orthogonal polynomials 
with respect to the inner product on functions from {0, 1, ... , n} to K defined by 

= 1)7»5W = ^J2 f{\u\)9{\u\). 

Specifically, 



o" ^ — ' \ I J q" 



{KuK^)=(^^{q-lf5ii. 

This orthogonality allows us to extract coefficients from a linear combination of 
Krawtchouk polynomials. Indeed, the coefficients Cj of a function 

n 
3=0 
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are given by 

For any code C C (not necessarily linear) with distance distribution a = 
{Aq, . . . ,An), define its dual distribution a-"- = {Aq, . . . , A^) by a-"- = Ka./\C\; i.e., 

1 " 

'^f=\r\Yl ^'^^•(^) fo"^ j = 0, 1, . . . , n. 

i=0 

If C is linear then a-"- is the distance distribution of C""-, and thus Aj- > 0. 

In fact, as observed by Delsarte [D72], Aj- > holds for all codes, not just 
linear codes. These Delsarte inequalities arc the fundamental principle behind linear 
programming bounds. In Section 8, we will review a proof in the course of proving 
slightly stronger inequalities. 

We say that C C is a t-design if its dual distribution satisfies Aj- = for 
1 < j <t. Using Krawtchouk polynomials as a basis for polynomials of degree at 
most t, one can check that C is an t-design if and only if every polynomial / of 
degree at most t satisfies 

1 " 1 " / \ 

(2-4) ^E^^/» = ^E • ('^-w). 

i=o ^ i=o 

Furthermore, C is an f-dcsign if and only if for every t coordinate positions, the 
restrictions of the codewords to those coordinate positions are equidistributed among 
all possibilities (Theorem 4.4 in [D73]). 



3. Linear programming 

In this section we formulate the linear program for energy minimization, which 
is the discrete analogue of Yudin's bound [Y92] (and has also been studied in the 
asymptotic regime [ABLOl]). 

Suppose C C F^ with \C\ = N. The Delsarte inequalities give linear constraints 
on the distance distribution of C. It follows that the value of the following linear 
program in the variables Aq, Ai, . . . , An gives a lower bound for the potential energy 
Ef{C): 

n 

minimize Ajf^i) 

i=l 

n 

(3 1) subject to ^AiKj{i)>0 for j = 1, 2, . . . , n, 

Ao + Ai + --- + An^N, 
Ao = l, 

Ai>0 for i = l,2,...,n. 

In particular, if the optimum of the linear program equals Ef(C), then C minimizes 
the /-potential energy among all codes of size A''. 
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Definition 3.1. A code C C is LP universally optimal if its distance distribution 
{Aq, . . . , An) is an optimal solution to the linear program (3.1) for every completely 

monotonic potential function /. 

Ashikhmin and Barg [AB99] call LP universally optimal codes extremal codes. 

Every LP universally optimal code is universally optimal, but not conversely. 
For instance, there is only one code of size three in F2 up to isomorphism, so it is 
automatically universally optimal. This code has distance distribution (1,4/3,2/3), 
but the optimal solution in the linear program does better with (1, 1, 1), which is 
not the distance distribution of any actual code. 

For any fixed code, checking whether it is universally optimal or LP universally 
optimal is a finite problem, since we can write down a basis for the cone of completely 
monotonic functions. 

Lemma 3.2. The cone of completely monotonic functions on {0, 1, . . . ,n} is the 
nonnegative linear span of the fundamental potential functions /o, /i, • • • , /n defined 

The potential energy with respect to fj is exactly the j-th binomial moment of the 
distance distribution, as defined by Ashikhmin and Barg [AB99]. Thus, Lemma 3.2 
shows that a code is universally optimal if and only if its binomial moments are 
minimal, so the results of [AB99] can be restated in terms of universal optimality. 

The statement of Lemma 3.2 includes in the domain of /, which will be 
notationally convenient when we consider duality in Section 4. We can always extend 
a completely monotonic function / from {1, 2, . . . , n} to {0,1, ... ,n} by setting /(O) 
to be a sufficiently large value that the complete monotonicity conditions continue 
to hold. Of course, the value /(O) is irrelevant for computing energy. 

We say that a function / : {a, a + 1, . . . , 6} — )• R is absolutely monotonic if all its 
finite differences are nonnegative; i.e., A'^/(i) > whenever A; > and a < i < b— k. 

Proof of Lemma 3.2. By making a change of variables from a; to n — x, it suffices 
to prove that the functions 17^(2;) = fj{n — x) = (^), for j = 0,1,..., n, form 
a basis for the cone of absolutely monotonic functions on {0, 1, . . . ,n}. Indeed, 
A^gj{x) = gj-r{x) for r < j, and A^gj{x) = for r > j, so each gj is absolutely 
monotonic. Conversely, every function g: {0, 1, . . . , n} — >■ M can be written as a 
polynomial of degree of at most n, and therefore 

i=o ^■^^ j=o 
by the discrete calculus analogue of the Taylor series expansion. If g is absolutely 
monotonic, then A^g{0) > for all j, as desired. □ 

Therefore, to check that a given code is universally optimal, it suffices to check 
that it is universally optimal for the fundamental potential functions in Lemma 3.2. 
Checking whether a code is LP universally optimal can done efficiently using any 
linear programming package (combined with rational arithmetic for rigorous proofs). 
However, it is generally infeasible to search through all possible codes of a given 
size, so checking whether a given code is universally optimal seems quite difficult if 
we do not know that it is LP universally optimal, even though it is a finite problem. 

The [7, 4, 3] binary Hamming code has distance distribution (1, 0, 0, 7, 7, 0, 0, 1). 
It can be checked that this vector is an optimal solution to (3.1) for each of the 
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fundamental potential functions, so this code is LP universally optimal. However, 
proofs by explicit computation are not illuminating and cannot deal with infinite 
families of codes. In the rest of this paper, we will develop more conceptual 
techniques for proving universal optimality. 

The linear program (3.1) has the following dual with variables cq, . . . , c„: 



maximize Ncq — ^^^CjKj{0) 



n 

(3.2) subject to '^^CjKj{i) < f{i) for z = 1, 2, . . . , n, 

Cj > for J = 1, 2, . . . , n, 

Co is unrestricted. 

Specifically, given (3.1) and (3.2), if we set h{i) = 'Y^'j^QCjKj{i), then 

n 

= Y.A, ifii) - hit)) + E E ^^^^ (^) 

i=l j=0 i=l 

n n n n 

= E (/(t) - h{i)) + E E (^) - E ^^-^^ (0) 

i^l j=0 i=0 j=0 

n n n n 

^Y^A, ifii) - h{i)) + J2 E + - ^ c,Kj{0) 

i=l j=l i=0 j=0 

n 

> Nco - ^97^^ (0). 

3=0 

Thus, we can restate the dual linear program as follows. 

Proposition 3.3. Suppose / : {1, . . . , n} — >■ M is any function, h: {0, 1, . . . , n} — > M 
satisfies 

h{i)<f{i) for i = 1,2,..., n, 
and there exist cq, ci, . . . , c„ with cj > for j > 1 such that 

n 

h{i) = ^ CjKj{i) for i = 0,1, . . . ,n. 

3=0 

Then every code C C with \C\ = N has f -potential energy at least Ncq — h{0). 

In the derivation above, complementary slackness tells us that the bound is sharp if 
and only if the following two conditions hold: f{i) = h{i) whenever i> 1 and Aj = 0, 
and Cj = whenever j > 1 and X]"=o ~ ^- Because Aj- = X^"=o 

we arrive at the following criterion for optimality. 

Proposition 3.4. Let C C be a code with distance distribution (Ai) and dual 
distribution {Aj-), and let f : {1, . . . ,n} — M be any function. Suppose that there 
exists a function h: {0, 1, . . . , n} M satisfying 
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(a) h{i) < f{i) for i — 1,2, ... ,n, with equality whenever Ai > 0, and 

(b) there exist cq, ci, . . . , c„ so that h{i) = Y^^=o ^j^ji^) Z*"^ '^^^ ^ '^'^'^ — 
for j = 1,2, ... ,n, with equality whenever Aj- > 0. 

Then C minimizes f -potential energy; that is, Ef{C) < Ef{C') whenever C C 
and \C'\ = \C\. 

Let us give an example of how to apply Proposition 3.4 to prove universal 
optimality. Wc say that a code with distance distribution [Aq, . . . , A„) is supported 
at 5 C {1, 2, . . . , n} if A, = whenever i ^ {0} U S. 

Proposition 3.5. If C CF^ is a code that is a 1-design and is supported at either 
a single integer or two consecutive integers, then C is LP universally optimal. In 
fact, C minimizes every convex, weakly decreasing potential function. 

Proof. Let / be a convex and weakly decreasing potential function, and sTipposc 
that C is supported C at {a} or {a — l,a}. Let h{i) be the linear function that 
agrees with f{i) at i € {a — 1, a}. Then h{i) < f{i) for alH > since / is convex, 
with equality at the nonzero supports of C. Thus, part (a) in Proposition 3.4 is 
satisfied. 

Since h{i) is linear, h{i) = CqKqIi) + ciKi{i) for some Cq and Ci. We have 
-^"0(3;) = 1 and Ki{x) = {q — l)n — qx. The slope of h is nonpositive since / 
is weakly decreasing, so ci > 0. Since C is a 1-design, Ai = 0. So part (b) in 
Proposition 3.4 is also satisfied. This show that C minimizes /-potential energy. □ 

Example 3.6. The binary simplex code [MS77, Ch. 1, §9] is a [2'' - 1, r, 2*^-1] linear 
code, whose basis can be given by {vi, . . . ,Vr}, where the i-th coordinate of Vj is 1 if 
the j-th rightmost binary digit of i is 1 and otherwise. The only nonzero distance 
in the simplex code is 2^~^ (hence the name "simplex"), so the code is supported 
at one distance. The dual of the simplex code is a Hamming code with minimum 
distance 3, which implies that the simplex code is a 2-design. Thus, the simplex 
code is LP universally optimal by Proposition 3.5. 

4. QUASICODES AND DUALITY 

In this section we show that LP universal optimality is preserved under duality. 
This key fact helps us to prove that many well-known families of codes are universally 
optimal, since sometimes it is easier to apply Proposition 3.4 to the dual distribution 
of a code. This duality-invariance does not seem to have an analogue for universally 
optimal spherical or projective codes, which were studied in [CK07]. Indeed, for 
error-correcting codes, both the distance distribution and the dual distribution arc 
indexed by {0, 1, . . . , n}. On the other hand, in the continuous setting the distance 
distribution is indexed by a continuous variable while the dual distribution remains 
discrete, so the symmetry is broken. 

The linear program (3.1) can only discern codes by their distance distributions, 
so we isolate this information and call it a quasicode. 

Definition 4.1. A quasicode a of length n and size N over Fg is a real vector 
{Ao, Ai, . . . ,An) satisfying the constraints of the linear program (3.1). In other 

words, 

n 

a > 0, Ka>0, ^Ai = N, and Aq = 1. 

i=0 
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Here K = {Ki{j))o<ij<:nj and a > means that all coordinates of a are nonnegative. 
We write |a| for the size N of the quasicode. The dual of a is defined to be the 
quasicode 

a-L = ^Ka. 

For every code C C F^, its distance distribution is a quasicode a with |a| = 
\C\. Furthermore, if C is a linear code, then its dual linear code C""- has distance 
distribution a-*-. Of course, not every quasicode comes from a code; for example, 
the entries of a quasicode needn't even be rational numbers. 

Using the identity K"^ = q^I from (2.2), we see that a-"- is another quasicode 
with [a-*- 1 |a| = g". Indeed, letting the superscript t denote the transpose and 1 the 
all-ones vector, we have |a-'-| |a| = l*a-'- |a| = l*ifa, and note that I^K is the first 
row of — q"/. It also follows from — q"/ and |a-'-| |a| = that a-'-'- = a. 
The dual operator is also known as the Mac Williams transform [MS77, p. 137]. 

Given a potential function /: {0,1,..., n} — )• M, let f be the column vector 
(.f(0)./(l), • • • ■/(".))• (As in Lemma 3.2. it is convenient to include /(O).) Mini- 
mizing the /-potential energy of a quasicode a amounts to minimizing the inner 
product 

n 

f*a = ^/(z)^i. 

This quantity is slightly different from the definition of /-potential energy that we 
gave earlier in that we now include the /(O) term, but it does not affect the notion 
of universal optimality since Aq = 1. 

Definition 4.2. For any potential function / : {0, 1, . . . , n} — > M, we say that the 
quasicode a of length n minimizes f -potential energy if f*a < f*b for every quasicode 
b of length n with |a| = |b|. We say that a is a universally optimal quasicode if it 
minimizes every completely monotonic potential energy function. 

Lemma 3.2 tells us that we just have to check the fundamental potential functions 
/i(^) = ('7l for j = 0,l,...,n. 

Since quasicodes are precisely the feasible solutions to the Delsarte linear pro- 
gram (3.1), we have the following equivalence. 

Proposition 4.3. A code is LP universally optimal if and only if its distance 

distribution is a universally optimal quasicode. 

Universally optimal quasicodes often exist in low dimensions; for example, they 
exist for all n < 11 and 1 < iV < 2". Nevertheless, it is not true that they exist for all 
lengths n and sizes N. One counterexample is (n, N, q) = (12, 32, 2), as one can check 
that no quasicode minimizes all the fundamental potential functions simultaneously. 
To do so, note that for each fundamental potential function, minimizing energy puts 
a linear constraint on the quasicode. There is a unique vector that satisfies all these 
constraints, namely 

(1, 0, 0, 0, 0, 15/2, 79/4, 0, -5/4, 15/2, -27/4, 5, -3/4). 

However, the negative entries mean it is not a quasicode. Even if we only consider 
q = 2 and N = 2"/^, corresponding to the parameters of a self-dual binary code, 
there still may not exist universally optimal quasicodes; the first counterexample 
occurs at n = 28 (or n = 27 if one does not require A'' to be integral). 
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Given a quasicode (^o, • • • , ^n) with dual {Aq, . . . , A:j^), we call {i > : Ai ^ 0} 
the support of the quasicode, and {i > : Aj- ^ 0} the dual support of the quasicode. 
Given an actual code C C , we can also talk about its support and dual support, 
referring to the corresponding quasicode. 

We say that a quasicode is a i-design its MacWilliams transform satisfies Aj- = 
A2 = • • • = A:^ = 0. Thus a code is a t-design if and only if its underlying quasicode 
is a f-dcsign. 

The complementary slackness from Proposition 3.4 applies also to quasicodes. 
Since quasicodes are equivalent to feasible solutions to the linear program (3.1), com- 
plementary slackness is a necessary and sufficient condition for quasicode universal 
optimality. 

Proposition 4.4. Let a = {Aq, Ai, . . . , An) be a quasicode and a.^ = {Aq, Aj^, . . . , A^) 
its dual. For any potential function f: {1, 2, . . . , n} — > M, the quasicode a minimizes 
f -potential energy if and only if (Ai) and {Aj-) satisfy the conditions of Proposi- 
tion 3.4- Namely, if S is the support of a and S-^ is the support o/a-"-, then a min- 
imizes f -potential energy if and only if there exists a function h: {0, 1, . . . , n} ^ M 
satisfying 

(a) h{i) < f{i) for i = 1, 2, . . . , n, with equality whenever i G S, and 

(b) there exist cq, ci, . . . , c„ so thai h(i) ~ X]j=o '^i^A^) f'^^ '^^^ ^ '^'^'^ ^3 — ^ 
for all j = 1,2, ... ,n, with equality whenever j S S'^ . 

Note that the conditions abovc^ do not take into account the actual values of the 
quasicode, only its support and dual support. 

Proposition 4.5. Whether a quasicode is universally optimal depends only on its 
length, size, support, and dual support. 

Note that a universally optimal quasicode is unique (given its length and size) 

if it exists, because the energies with respect to the n + 1 fundamental potential 
functions put n + 1 constraints on the quasicode, which are linearly independent 
because there is one potential function of each degree. 

Given the support S and the dual support S'^ of a quasicode, for any potential 
function / we can set up a linear program using variables cq , ci , . . . , c„ and constraints 
specified by Proposition 4.4, so that the quasicode minimizes /-potential energy if 
and only if the linear program has a feasible solution. Varying / over all fundamental 
potential functions in Lemma 3.2, we get a finite procedure for deciding whether a 
quasicode is universally optimal using only its support and dual support. 

Furthermore, note that if a is a universally optimal quasicode, and b is another 
quasicode of the same length and size whose support and dual support are respectively 
contained in those of a, then b is also universally optimal, since the same h in 
Proposition 3.4 that works for a also works for b. Thus, b = a. 

Now we prove the key observation, stated in the introduction, that universal 
optimality in quasicodes is preserved under duality. We repeat the statement here 
for convenience. 

Proposition 1.2. Let a be a quasicode. Then a is universally optimal if and only 

if its dual a^ is. 

To prove the proposition, we first show that the space of completely monotonic 
functions is also preserved under duality. 
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Lemma 4.6. // f represents a completely monotonic function, then so does K^f. 

This lemma further justifies completely monotonic functions as a natural class of 
potential functions to consider for energy minimization. 

Proof. Recall from Lemma 3.2 that the functions fj{x) = ("^^) form a basis for 
the cone of completely monotonic functions. Let ij denote the column vector 
corresponding to fj. To see that K* leaves the cone of completely monotonic 
functions invariant, we will use the identity 

(4.1) K%=q^-%_j. 

Note that it can be rewritten as 

,.2, 

Wc use the following generating function for Krawtchouk polynomials (see [MS77, 
p. 151]): 

n 

^ Kk{i)z'' = {l + {q- l)zr-\l - zf. 
fe=o 

By setting 2; = (1 + w)~^ we can rewrite it as 

n 

^ K^{i)(w + 1)"-'= = (w; + qr-'w\ 
k=0 

Then (4.2) follows from comparing the coefficient of in the above formula. □ 

Proof of Proposition 1.2. Since the dual operator is an involution, it sufl&ces to 

prove that if a is universally optimal, then so is a-*- . Every quasicode can be written 
as b-"- for some quasicode b. So it suffices to show that f'a-"- < f*b-'- for every 
completely monotonic potential f whenever |a| = |b|. By Lemma 4.6, K*i is also 
completely monotonic. By the universal optimality of a wc have 

|a| f*a-L = f'ifa = (if*f)*a < (if*f)*b = f*Kh = |b| f*b-^. 

Therefore a-"- is universally optimal. □ 

One can also prove Proposition 1.2 using Proposition 4.4 by constructing an 
appropriate auxiliary function h. 

Recall that a code C C is LP universally optimal if and only if its distance 
distribution is a universally optimal quasicode. So to prove that a given code is 
universally optimal, it suffices to show that its dual distribution is a universally 
optimal quasicode. For linear codes, the dual distribution corresponds to the dual 
code. 

Corollary 4.7. A linear code C is LP universally optimal if and only if its dual 

is also LP universally optimal. 

Example 4.8. We saw in Example 3.6 that the [2*" — 1, r, 2*""^] binary simplex code 
is LP universally optimal. The dual of the binary simplex code is the [2'' — 1, 2''— ^ — 1] 
Hamming code. Since the binary simplex code is LP universally optimal, it follows 
that the Hamming code is also LP universally optimal (as proved by Ashikhmin 
and Barg in Example 5 of [AB99]). 
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We say that a linear code C is universally optimal among linear codes if Ef{C) < 
Ef{T>) for c!vcry completely monotonic function / and every linear code T) of the 
same block length and dimension as C. There is also a version of the duality result 
for linear codes. 

Proposition 4.9. A linear code C is universally optimal among linear codes if and 
only if its dual is also universally optimal among linear codes. 

This generalizes Theorem 2.60 in [K07], which is the special case of error detection 
(i.e., exponential potential functions). 

Proof. Since (C -'-)-'- = C, it suffices to prove the only if direction. Suppose that C is 
universally optimal among linear codes. It suffices to prove that Ej{C^) < Ej{V^) 
for every completely monotonic function / and every linear code T) of the same 
block length and dimension as C. This amounts to showing that f'iiTa < f*i^b, 
where a and b are the weight distributions of C and V, respectively. Since C is 
universally optimal among linear codes, we have Eg{C) < Eg{T>), where g = i('*f is 
completely monotonic by Lemma 4.6. So we have f*/ra < f*Kh, as desired. □ 

However, we do not know whether it is true that a linear code C is universally 
optimal outright (among all codes of the same size) if and only if is. 

5. Constructing dual solutions 

Proposition 3.4 gives us a tool for proving universal optimality for quasicodes. To 
apply it, we need to construct an auxiliary function h satisfying two key requirements. 
First, we need h{i) < f{i) for all i, with equality whenever i is in the support of the 
quasicodc. Second, we must be able to write ft. as a nonnegative linear combination 
of Krawtchouk polynomials, a property we call positive definiteness, and we may only 
use the Krawtchouk polynomials Kj for j not in the dual support of the code. We 
will often take /i to be a nonnegative linear combination of low degree Krawtchouk 
polynomials. For instance, when the code is an s-design, we can try to construct h 
as a polynomial of degree at most s, as such polynomials can always be written as a 
linear combination of Kq, Ki, . . . , Kg. This procedure is thus related to polynomial 
interpolation. 

5.1. Positive definite functions. For every function h: {0, 1, . . . , n} — ^ M, we 
can find cq, ci, . . . , c„ such that 

n 

h{i) = CjKj{i) for i = 0, 1, . . . , n. 
j=o 

Indeed, the coefficients can be extracted using (2.3). Alternatively, if h is the column 
vector with entries (/j(2))o<i<n7 then h* = c^K, so that q^c* = h'^K as = q"I. 
Call Cj the Krawtchouk coefficients of h. For any < s < n the Krawtchouk 
polynomials Kq, Ki, . . . , Kg span the polynomials of degree at most s, so that if h 
is given by a polynomial of degree s, then Cj = for j > s. 

Now we consider the requirement Cj > from Proposition 3.4. 

Definition 5.1. Let h: {0, 1, . . . , n} — M be a function. We say that h is positive 
definite if its expansion in terms of Krawtchouk polynomials has nonnegative 
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coefficients. That is, if there exist cq, ci, . . . , c„ > such that 

n 

(5.1) h{i) = CjKj{i) for i = 0, 1, . . . , n. 

j=o 

The name "positive definite" is used because such functions give rise to positive 
definite kernels via {x, y) h{\x — y\), meaning that {h{\xi — a;j|))i<,:,j<m is a pos- 
itive semidefinite matrix whenever .xi, . . . , Xm € F^. Here Krawtchouk polynomials 
play the role of zonal spherical functions. We refer the reader to [CS99, Ch. 9] for 
more discussion of this connection. See also [CK07, §2.2] for a review of the theory 
in the context of Tiltraspherical polynomials (instead of Krawtchouk polynomials) 
and Theorem 2 in [DL98] for the discrete case. 

Proposition 4.4 does not actually require Cq > 0. However, there seems to be 
little harm in assuming it. Doing so allows us to use properties of positive definite 
functions such as the following lemma. 

Lemma 5.2. The product of any two positive definite functions is also positive 
definite. 

This standard lemma follows immediately from the Schur product theorem 
(Theorem 7.5.3 in [HJ13]). 

Lemma 5.3. The function h{x) = a — x is positive definite if and only if a > 

(g- l)n/q. 

Proof. We have Ki{x) = {q— l)n — qx, so 

h = (a - — + -Ki. 

V q J q 

□ 

CoroUctry 5.4. Ifai, 02, . . . , > {q—l)n/q, then h{x) = {ai—x){a2—x) ■ • • {ag — x) 
is positive definite. 

Lemma 5.5. Let a = {Aq, . . . , A„) he a quasicode whose support consists of ai < 
a2 < • • • < fls and suppose that a. is a (2s — l)-design. Then h{x) = (ai — x){a2 — 

x) ■ ■ ■ — x) is positive definite. 

Proof. Let Cj be as in (5.1). Since h is & degree s polynomial, Cj = for j > s. 
From the sign of the leading coefficient, we find that Cg > 0. Now, for j < s — 1, 
using the fact that a is a (2s — l)-design, we have by (2.3) and (2.4) 

(q - ly f")c, = J2 f") (1 - ^nmAi) = ^ E AM^WM 

The right side is nonnegative since Aih{i) = whenever i > 1 and AQh{0)Kj{0) > 
0. □ 

Lemma 5.6. for < j < n, the function h{x) = {n—j+l—x){n—j+2—x) ■ ■ ■ (n—x) 
is positive definite. 

Proof. We have h{x) = j\fj{x), where fj is the fundamental potential function from 
Lemma 4.6. So c* = g-"h*/r = q'^'jlf^K = g-^j!f*_, > by (4.1). □ 
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5.2. Polynomial interpolation. In this section we give a discrete analogue of 
Hermite interpolation as used in Sections 2.1 and 3 in [CK07]. Hermite interpolation 
is a generalization of Lagrange interpolation where we specify not just the values of 
our desired polynomial at the interpolation points, but also the derivatives up to 
some specified order. In the application of Hermite interpolation in [CK07] , we want 
to bound a function from below using a low degree polynomial, with the additional 
requirement that equality is attained at certain points. This is done by requiring that 
the polynomial agree with the given function to second order at the specified set of 
points. A key fact used in [CK07] was that the Hermite interpolation of an absolutely 
monotonia function alternates above and below the original function, accounting 
for multiplicity. So in particular, near every interpolation point of multiplicity two 
(i.e., agreement in value and first derivative), the interpolated polynomial stays on 
one side of the function and is tangent to it. We derive a discrete analogue of this 
interpolation technique. 

First wc give a discrete analogue of RoUe's theorem. 

Lemma 5.7. Let a < b be integers. If a function g: {a, a + 1, . . . , + 1} — ^ M 
satisfies g{a)g{a + 1) < and g{b)g{b + 1) < 0, then Ag{c)Ag{c + 1) < for some 
a < c < b. 

Proof. Without loss of generality, we may assume that g{a) < and g{a + 1) > 0, 
since otherwise we can work with —g instead. Since at least one of g{b) < and 
g{b + 1) < is true, the sequence g{a + l),g{a + 2), . . . , g{b + 1) cannot be always 
strictly increasing. If c is the smallest integer such that g{c + 1) > g{c + 2), then 
Ag{c) > and Ag{c + 1) < 0, as desired. □ 

Lemma 5.8. Let ai < 02 < • • • < be integers. If a function 

g: {ai,ai + l,ai + 2, . . . ,ar + 1} -> M 

satisfies g{ai)g{ai+i) < for i = 1,2, . . . ,r, then there is some integer c such that 
ai<c<ar-r + l and A'-'^g{c)A'-'^g{c + 1) < 0. 

Proof. This follows from repeated applications of Lemma 5.7. □ 

Recall that a function / : {0, 1, . . . , n} ^ R is absolutely monotonic if all the 
finite differences A'^f are nonnegative for fc > 0, and completely monotonic if 
X ^ f{n — x) is absolutely monotonic. 

Now we show that when / is absolutely monotonic, the interpolated polynomial p 
alternates above and below / between pairs of interpolation points. Furthermore we 
show that it is possible to write p bs a. nonnegative linear combination of terms of 
the form Y\\^i{x — ai). The corresponding result for completely monotonic functions 
follows from the substitution x ^ n — x. Such decompositions are useful since we 
showed in Section 5.1 that certain functions of the form nj=i(''i ~ ^) ^^e positive 
definite. 

Lemma 5.9. Let f: {0, 1, . . . ,n} — > M be an absolutely monotonic function, let 
ai,a2, . ■ ■ ,ar be distinct integers in {0, 1, ... , n}, and let p{x) be the unique polyno- 
mial of degree at most r — 1 such that p{ai) = fifli) for i = 1,2, . . . ,r. 
(a) We have 

r 

{fix)-p{x))l[{x-ai)>Q 

i=l 
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for all X — 0,1, ... ,n. 
(b) There exist nonnegative real numbers cq, ci, . . . , Cr-i such that 

r-l j 
j=0 i=l 

for all X = 0,1, ... ,n. 

Part (b) is the discrete analogue of Lemma 10 in [CW12]. Note that it depends 
on the ordering of ai, . . . , a^) but all orderings work. Typically we will use ai < 
a2 < ■ ■ ■ < ar- 

Proof. Part (a) is trivial when x is equal to some Oi, so assume that this is not the 
case. Consider the function g: {0, 1, . . . , n} — > M given by 

g{t) = fit) - p{t) - A{t - ai){t - aa) • • • (i - a^) 

with the constant A chosen so that g{x) = 0; in other words, 

f{x)-p{x) 



A 



n[=i(.T-a,:)' 



We have g{ai) = for i = 1, 2, . . . , r as well as g{x) = 0, so applying Lemma 5.8 
to g implies that there is some integer c such that A^g{c)A''g{c + 1) < 0. Thus, 
A'^fl'(c') < for either c' = c or c' = c + 1. Now, 

A^g{c') = AV(c') - Ar\ 

since the r-th difference of any polynomial of degree less than r is zero and that of 
1 1-^ t^ is r\. Since / is absolutely monotonic, f{d) > 0, so it follows that A > 0. 
Therefore, 

r r 

{fix) - p{x)) l[{x -ai)=Al[{x- ai)^ > 0. 

i=l i=l 

This completes the proof of part (a). 

For part (b), note that there exist cq, . . . , Cr-i satisfying 

r-l j 
j=0 i=l 

because the right side uses one polynomial of each degree up to r— 1. Setting x = ae 
gives 

e-i j 

f{ae) = p{ae) = ^ cj JJ(a^ - a^). 

3=0 i=l 

This equation involves only the unknowns cq, Ci, . . . , so we can successively set 
a; = for ^ = 1, 2, . . . , r to solve for each q. To show that they are nonnegative, 
we begin with cq = f{ai). Now for each £, 

Pi{x) =^Cj'[\_{x - ai) 

3=0 i=l 
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is the unique polynomial of degree less than £ satisfying p{ai) = f{ai) for i = 
1,2,. . . ,i. Applying part (a) to pg, we find that for 1 < £ < r — 1, 

< (/(a^+i) -pe{ae+i)) ]J(a^+i - Oj) 

i=l 

e 

= {Pe+i{ae+i) - Pi{ai+i)) JJ(o^+i - 

i=l 

i=l 

It follows that Q > 0, as desired. □ 

Let us rephrase Lemma 5.9 for completely monotonic functions, by substituting 
n — a; for x. 

Lemma 5.10. Let f: {0, 1, . . . , n} — > M 6e a completely monotonic function, let 
bi, 62, . . . , 6r be distinct integers in {0, 1, . . . , n}, and let p{x) be the unique polyno- 
mial of degree at most r — 1 such that p{bi) = f{bi) for i = 1, 2, . . . , r. 

(a) We have 

r 

U{x)-p{x))X{{b,-x)>Q 

i=l 

for all .T = 0, 1, . . . , n. 

(b) There exist nonnegative real numbers cq, ci, . . . , c^-i such that 

r-l j-i 

for all X = 0,1, ... ,n. 

6. Criteria for universal optimality 

In this section, we use the results from Section 5 to construct auxiliary functions 
h in Proposition 3.4. In the spherical code case [CK07], the key idea was to use 
Hermite interpolation to get a polynomial lower bound for a given potential function 
by requiring that the polynomial be tangent to the potential function at specified 
points. To achieve a similar effect for polynomial interpolation on {0, 1, ... , n}, we 
use Lagrange interpolation but require that the polynomial agree with the potential 
function at pairs of consecutive points. Recall that the support of a code or a 
quasicode is the set of i > for which Ai ^ 0. 

Definition 6.1. Given a code C (or a quasicode a) of length n over Fg, a pair 
covering is a subset TC{l,2,...,n} with elements &i < 62 < • • • < &( containing 
the support of C (or a), and such that 621-1 + 1 = 62? whenever 2i < t. We also 
require 6( = n if f is odd. 

For example, for the extended Hamming code with n = 8 and support {0, 4, 8}, 
the sets {4, 5, 8} and {3, 4, 7, 8} are both pair coverings, whereas {3, 4, 5, 8} and 
{5, 6, 8} are not pair coverings. 
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Proposition 6.2. Let a be a quasicode of length n and T a pair covering of a. with 
elements hi < 62 < • • • < bf Then a is universally optimal if the following two 

hypotheses are satisfied: 

(a) a is a {t — l)-design. 

(b) For each 1 < j < t — 1, the function qj{x) = YIIZq {bt-i — x) is positive 
definite; i.e., it has nonnegative Krawtchouk coefficients. 

Note that all results in this section also apply to LP universal optimality of codes, 
by Proposition 4.3. 

Proof. Let /: {0,1,..., n} -> M be a completely monotonic function, and let h 
be the unique polynomial of degree less than t such that h{x) = f{x) for all 
X E T. Wc want to show that h satisfies the complementary slackness conditions in 
Proposition 3.4, thereby proving that a is universally optimal. 

Apply Lemma 5.10(a) to / and h. Since T is a pair covering, 111=1 (^r-i — a;) > 
for all X, so f{x) > h{x) for all x G {0,1, . . . , n}. Thus part (a) of Proposition 3.4 
is satisfied. 

Also by Lemma 5.10(b), there are nonnegative coeflacients cj such that 

L-l j-1 

Hx) = Y.^i]l^'^t-i ~ 

j=0 i=0 

Since each Ilto (^t-i ~ x) has nonnegative Krawtchouk coefficients by hypothesis 
(b), we see that h{x) does as well. Since the degree of h{x) is less than t, only 
polynomials of degree less than t occur in the expansion. Because a is a {t— l)-design, 
Ai = A2 = ■■ ■ = A^_i = 0. Thus, part (b) of Proposition 3.4 is also satisfied. This 
shows that a is universally optimal. □ 

Now we discuss several special cases where part (b) of Proposition 6.2 is easy to 
verify using our results from Section 5.1. 

Proposition 6.3. Let a be a quasicode of length n over ¥q, and let T he a pair 
covering of a with \T\ =t. If a is a {t— l)-design, and at most one element ofT is 
less than {q — l)n/q, then a is universally optimal. 

Proof. We only need to check condition (b) of Proposition 6.2. Since at most one 
element of T is less than (q— l)n/q, for any I < j < t — l, the product Y[i=oi^t-i^x) 
is a product of linear polynomials of the form 6 — a; for 6 > (g — l)n/q, and hence is 
positive definite by Corollary 5.4. □ 

Proposition 6.4. Let a he a quasicode of length n over Fg . Suppose that a has s 

nonzero support elem,ents and is a (2s — 1) -design. Furthermore, suppose that every 
two elements in the support differ by at least 2, and at most one element of the 
support is less than {q — l)n/q. Then a is LP universally optimal. 

Proof. We shall construct a pair covering that satisfies the conditions of Proposi- 
tion 6.2. Suppose that nonzero elements of the support are ai < a2 < ■ ■ ■ < ag, so 
that Oi > {q — l)n/q for all i > 2. If < n, then set 

T = {ai - 1, ai} U {02, a2 + 1} U {03, 03 + 1} • • • U {a^, + 1} , 

and if = n, then set 

T = {ai- 1, ai} U {as, 02 + 1} U {03, 03 + 1} • • • U {a,} . 
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By construction, T is a pair covering. Let t = \T\. When < n, we have t = 2s, 
and when = n, we have t = 2s — 1. So a is always a {t — l)-design and condition 
(a) of Proposition 6.2 is satisfied. 

Now we check condition (b) of Proposition 6.2. In the < n case, the partial 
product of an initial segment of {(is + 1 — x) (a,, — .t) ■ ■ ■ (02 + 1 — x) (02 — is positive 
definite by Corollary 5.4 since aj > {q — l)n/q for j > 2. Also (a., + 1 — .T)(n., — 
x) ■ • ■ {a2 + l — x){a2 — x){ai~x) is positive definite since (a^ — a;)(as_i — .1;) • • • (fii — .t) 
is positive definite by Lemma 5.5 and (cg + 1 — x){as-i + 1 — x) • • • (02 + 1 — x) 
is positive definite by Corollary 5.4 and so their product is positive definite by 
Lemma 5.2. This completes the Ug < n case. The as = n case is nearly identical. 
Thus, condition (b) of Proposition 6.2 is satisfied, and therefore a is universally 
optimal. □ 

We also saw the following special case in Proposition 3.5. We repeat the statement 

here for convenience, in the language of quasicodes. 

Proposition 6.5. Let a. be a quasicode. Suppose that a. is a 1-design, whose support 
consists of either a single integer or two consecutive integers. Then a is universally 
optimal. 

Here is a criterion that does not quite fit in the context of Proposition 6.2 but is 

nevertheless useful. 

Proposition 6.6. Let a be a binary quasicode of length n. Suppose a is supported 
at {0,a— l,a,a + 1} where a is odd, while its dual satisfies = = 0. 
Then a is universally optimal. In fact, a minimizes every convex, weakly decreasing 

potential function. 

Proof. Proposition 6.2 is not strong enough for universal optimality in this setting, 
since the design strength is too low for the size of the support set. Instead, we 
directly work with the complementary slackness constraints in Proposition 3.4. Let 
/ be a completely monotonic function, and let 

i{x) = f{a - 1) + ^(/(a - 1) - fia + l))(a - 1 - x) 

be the linear polynomial that agrees with /at{a— l,a+l}. Let 

h{x) = i{x) + ^(/(a - 1) - 2/(a) + f{a + l))(Jr„(x) - 1). 

Recall that Kn{x) = (— 1)"^ when q — 2. Note that h agrees with / at {a — l,a,a+ 1}, 
and h{x) < £{x) < f{x) for x ^ {a — l,a, a + 1} since / is convex. Therefore h 
satisfies part (b) of Proposition 3.4. 

From the construction, wc see that h{x) = CQK(j(x) + c\K\(x) + c„i4'„(x). Here 
Ci = (/(a — 1) — f{a + l))/2 > since / is non-increasing and c„ = (/(a — 1) — 
2/(a) + /(a + l))/4 > since / is convex. Thus, part (b) of Proposition 3.4 is 
satisfied as well. □ 

7. Universally optimal codes 

In this section, we apply the criteria from Section 6 to specific codes to demonstrate 
their universal optimality. The results arc summarized in Table 1. 

Proposition 4.5 tells us that the LP universal optimality of a code can be deduced 
entirely from its support and dual support, and our criteria from Section 6 use only 
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Table 1. LP universally optimal codes. 

2 

Wc write a — > 6 to mean a. a + 1. a + 2, . . . ,b and a b to mean a, a + 2, a + 
4, . . . , b. Also, q denotes an arbitrary prime power, r an arbitrary natural 
number, and d an integer between 1 and n; a parameter k occurs for the 
Hadamard and conference codes. 



Name q n N Support C Dual support C 
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q 
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{3^8} 
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3 
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3 
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- punctured 
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this information. In Table 1 we have recorded the support and dual support of 
each code for easy reference. For some families of codes, it may happen that the 
support or dual support is a subset of the listed set, though this does not change 
the analysis. 

This section is organized as follows. We introduce each family of codes, and for 
each code, we refer to the result that proves its universal optimahty. For any specific 
code (e.g., the binary Golay code), the universal optimality can be proved by solving 
the appropriate linear program (3.1) (or the one described after Proposition 4.5) 
with software that uses rational arithmetic. However, such computer-aided proofs 
do not feel very satisfying, so whenever possible we refer to results that provide a 
conceptual proof of universal optimality, and we are able to do so for many but not 
all of the codes listed in the table. Note that conceptual proofs are necessary for 
infinite families of codes, such as the Hamming codes. 

Here we recall some terminology used in describing codes that are derived from 
other codes (see [MS77, §1.9]). Let C be a code of length n over whose distance 
distribution's support is contained in S. We can perform the following operations 
onC: 

(1) Extend: add one extra coordinate to each codeword of C so that the sum 
of the coordinates in each new codeword is zero. This results in a code of 
length n + 1 with support contained in 5 U (5 + 1). Furthermore, if the code 
is binary, then only even integers can appear in the support of its extension. 

(2) Take the even subcode (assuming q = 2): take the subset of the codewords 
whose sum of coordinates is even. The subcode is supported on the subset 
of S containing even integers. 

(3) Puncture: delete the last coordinate from all codewords, resulting in a code 
of length n — 1 supported on {SU{S — 1)) \ {n}. In the examples that we 
consider, deleting any other coordinate instead of the last coordinate would 
yield an isomorphic code due to symmetry. 

(4) Shorten: take the subset of the codewords whose last digit is 0, and then 
delete the last digit, resulting in a code of length n — 1 supported on 5\ {n}. 

If C is a linear code, then the dual support of C is the support of the dual . 
Puncturing C and then taking the dual results in the same code as shortening C""- . 

7.1. Hamming codes. Let us start with the binary case. We had already seen 
from Example 4.8 that binary Hamming codes [MS77, Ch. 1, §9] are LP universally 
optimal. Recall we first proved that the linear binary simplex code is LP universally 
optimal using Proposition 3.5 (also restated as Proposition 6.5), and then it follows 
by duality (Corollary 4.7) that the binary Hamming code is also LP universally 
optimal. 

Now let us consider some variations on Hamming codes. The [2'', 2'' — r — 1, 4] 
extended Hamming code has dual support {2'""^, 2''}. The LP universal optimahty 
of the code follows then from applying Proposition 6.3 to the dual. 

The even subcode of the [2*" — 1, 2'' — r— 1, 3] Hamming code is a [2*" — 1, 2'" — r — 2, 3] 
code whose dual is formed by adding the all-ones vector as a new basis vector to 
the simplex code. Hence the dual of the even subcode of the Hamming code is 
supported on {2''"^ — 1,2'""^, 2'' — l}- Its LP universal optimality again follows 
from applying Proposition 6.3 to the dual. 
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Similarly, we can shorten the [2*" — 1, 2'' — r — 1, 3] Hamming code to a [2'' — 2, 2"" — 
r — 2,3] code, whose dual is supported on {2''"^ — 1,2''}. This is LP universally 
optimal again by applying Proposition 6.3 to the dual. 

If we shorten the Hamming code twice, we get a [2'' — 3, 2'' — r — 3, 3] code whose 
dual is supported on {2''"^ — 2,2''"^ — 1,2''"^}. The LP universal optimality of 
this code then follows from Proposition 6.6. 

If we instead puncture the Hamming code, then we get a [2^ — 2,2"^ — r — 1,2] 
code whose dual is supported at {2'""^}. The LP universal optimality of this code 
follows from Proposition 6.5. 

General g-ary Hamming codes [vL99, §3.3] also behave well with respect to 
universal optimality. These codes have type [{q"^ — l)/{q—l),{q^ — l)/{q—l) — r,3\q 
and their dual is supported at {q^~^}- Thus, the LP universal optimality of q-aiy 
Hamming codes follows from applying Proposition 6.5 to the dual. 

The shortened q-ary Hamming code has type [{q'' — q) /((?— 1), {q^ — q)/{q— 2] 
and dual support {q^^^ — l,q^^^^. The punctured q-ary Hamming code has type 
[{q^ — q)/{q — 1)) {q^ — — 1) — ^) 2] and dual support {q^~^}. In both cases, 

LP universal optimality follows from applying Proposition 6.5 to the dual. 

Of course, any codes with the same distance distributions as these arc also LP 
universally optimal. See [H08] for a survey of the nonlinear analogues of Hamming 
codes. 

7.2. Simplex code. In Example 3.6 we saw that the [2'' — l,r, 2''"^] linear binary 
simplex code, which is also the dual of a Hamming code, is LP universally optimal. 
More generally, a simplex code is a code in which all pairwise distances are equal. In 
other words, it is supported at a single integer. So by Proposition 6.5 any 1-design 
simplex code is LP universally optimal. 

Recall that being a 1-design is equivalent to every coordinate having an equal 
distribution of the alphabet, and this property is preserved by puncturing. A punc- 
tured simplex code is supported at two consecutive integers, so it is LP universally 
optimal again by Proposition 6.5 as long as the simplex code was a 1-design. 

Shortened and doubly shortened Hadamard codes are examples of simplex codes 
that arc 1-dcsigns. (These codes are generally nonlinear.) Furthermore, the simplex 
1-designs are closed under the operation of repeating each coordinate a fixed number 
of times. 

7.3. Hadamard and conference codes. Now we look at some non-linear codes 
similar to the simplex code. 

Recall [MS77, Ch. 2, §3] that a Hadamard matrix i7„ is an n x n matrix with 
entries from {—1, 1} and satisfying HnH!^ = nin- Such matrices only exist when n 
is 1, 2 or a multiple of 4. It is an open conjecture that a Hadamard matrix exists 
for all multiples of 4. There are various constructions for Hadamard matrices, such 
as Sylvester matrices for n = 2^ and Paley matrices for n = g -|- 1, where q is any 
prime power congruent to 3 modulo 4. 

An (n, 2n, n/2) Hadamard code [MS77, Ch. 2, §3] is built from a Hadamard 
matrix ff„ by taking the rows of (J„ -i- Hn)/2 and (J„ — iJ„)/2, where J„ is the 
n X n matrix containing all I's. It has the property that the complement of every 
codeword is also in the code, and every pair of non-complementary code words differ 
in exactly half of the coordinates. So this code is supported on {n/2,n}, and its 
distance distribution is = A„ = 1, A„/2 = 2n — 2, and = for i ^ {0,n/2,n}. 
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It is also a 3-design. Indeed, Aj- = for every odd j due to symmetry about n/2. 
Namely, for = 2 we have Kj{n — i) = {—ly Kj{i), so that A^ = A^-i implies 

n n n 

1=0 i=0 i=0 

To check that A2 = for Hadamard codes, we have 
Ai = ^0^2(0) + A^/2K2in/2) + AMn) 



ri\ , , / f nl2\ /ri/2 



The LP universal optimality of the Hadamard code then follows from Proposition 6.3. 
This recovers Example 3 from [AB99]. 

If we puncture the Hadamard code, the resulting (n — 1, 2n, n/2 — 1) code remains 
a 3-design. It is now supported on {n/2 — l,n/2,n}, so its LP universal optimality 
again follows from Proposition 6.3. 

Conference codes [MS77, Ch. 2, §4] are similar to Hadamard codes. A conference 
matrix C„ is a n x n matrix satisfying C„C^ = (n — !)/„ with diagonal entries 
and other entries —1 or 1. They exist only when n = 2 (mod 4), but not for all 
values of n (e.g., there are no conference matrices of orders 22, 34, or 58). By an 
appropriate normalization, we may assume that the top-left entry of C„ is 0, and 
all other entries in the first column and row are 1. Let Sn-i be the lower-right 
(n— 1) X (n— 1) submatrix of C„. Then Sn-i satisfies Sn-iS^_i = (n— l)/„_i — J„_i 
and Sn-iJn-i = Jn-iSn-i = 0. The conference code is the {n — 1, 2n, n/2 — 1) code 
formed by taking the rows of {Sn-i+ In-i + Jn-i) /2, (— S'„^i +/„_i + J„_i)/2, plus 
zero and the all-ones vectors. It can be checked that the code is a 3-dcsign supported 
on {n/2 — 1, n/2, n — 1}, so its LP universal optimality follows from Proposition 6.3. 

7.4. Golay codes. The binary Golay code is a [23, 12, 7] code with many special 
properties [MS77, Ch. 2, §6]. Its extension is a [24, 12,8] self-dual code. Both codes 
are universally optimal. Indeed, we know their distance distribution (the support 
and dual supports arc recorded in Table 1), so we can run the appropriate linear 
program (e.g., (3.1) or the one described after Proposition 4.5) to show that they 
are LP universally optimal. This was observed in Example 1 of [AB99]. 

We can also give more conceptual proofs of the LP universal optimality of these 
codes without having to explicitly solve any linear programs. For the [23, 12, 7] 
Golay code, its support and dual support are {7, 8, 11, 12, 15, 16, 23} and {8, 12, 16}. 
So its dual is a 6-dcsign with support {8, 12, 16}, which satisfies the conditions 
of Proposition 6.4, thereby confirming LP universal optimality. Similarly, the 
[24, 12, 8] extended Golay code is self-dual with support and dual support both 
being {8, 12, 16}, and hence is LP universally optimal by the same reasoning. The 
punctured Golay code is universally optimal also by the same argument. We listed 
in Table 1 a few other modifications of the binary Golay code that we found to be 
LP universally optimal by explicitly solving the linear program, but we do not have 
conceptual proofs of their LP universal optimality. 

We also looked at the [11, 6, 5]3 ternary Golay code [MS77, Ch. 16, §2] its variants. 
The support and dual support of each code are listed in Table 1. The LP universal 
optimality of the ternary Golay code and its extended, punctured, and doubly 
punctured modifications follow from applying Proposition 6.4 to the dual. For the 
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doubly and triply shortened codes we can apply Proposition 6.3. For the quadruply 
shortened code we can apply Proposition 6.5. and for the triply punctured code we 
can applying Proposition 6.5 to the dual. The shortened ternary Golay code was 
also found to be LP universally optimal, but we do not know of a conceptual proof. 

7.5. MDS codes. Every {n,N,d)q code must satisfy N < g"-<^+ij since its min- 
imum distance d guarantees that all codewords remain distinct when restricted 
to any subset oi n — d + 1 coordinate positions, and there are at most q^-'^+^ 
codewords of length n — d+1. Codes where equality is attained, i.e., N = 

are called maximum distance separable (MDS) codes [MS77, Ch. 11]. MDS codes 
are automatically {n — d + l)-designs (by the equidistribution of codewords when 
restricted to any n — d + 1 coordinates) . The property of being MDS is preserved 
under shortening and puncturing. The dual of a linear MDS code is also MDS. 

Examples of MDS codes include Recd-Solomon codes [MS77, Ch. 10], ovals, and 
hyperovals [H78]. Linear MDS codes and arcs in projective spaces are equivalent 
objects (see [T92]). 

Proposition 7.1 (Ashikhmin and Barg). Every MDS code is LP universally opti- 
mal. 

This is Proposition 10 in [AB99]. 

Proof. Let C be an (n, q^-d+i^ MDS code. It is an {n — d + l)-design. We apply 
Proposition 6.2 with T = {d,d + 1, . . . ,n}. The functions Ilto {n — i — x) are 
positive definite for all j by Lemma 5.6. Thus, the conditions of Proposition 6.2 are 
satisfied, and C is LP universally optimal. □ 

7.6. Oval and ovoid codes. For any set of n points in the projective space P^{¥q), 
we can construct a linear code in as follows: let A be a (fc + 1) x n matrix 
whose columns are representatives of the n points in F^+^, and then take the linear 
subspace of generated by the rows of A. 

An n-cap in the projective space P'^(Fq) is a set of n points with no three collinear. 
This assumption is vacuous if (7 = 2, so we assume q > 2 from now on. A cap code 
is a linear code generated by a fc-cap [H78] . Note that every cap code is a 3-design, 
since the dual code has minimum weight at least four as no three points in the cap 
are collinear. 

When q is odd, the largest cap in P^(Fq) has size q + 1, and such a cap is called 
an oval. When q = 2™, the largest cap has size q + 2, and it is called a hyperoval. As 
mentioned earlier, codes arising from ovals and hyperovals are all MDS, and hence 
universally optimal. 

An ovoid is a (q^ + l)-cap in P"^(Fq). Sec [0'K96] for a survey on ovoids. The 
[q'^ + 1, 4]q code associated to an ovoid is supported on {q^ — q,q^}, as every plane 
in P'^(Fq) contains either 1 or + 1 points of the ovoid (see Lemma 1.3 in [0'K96]). 
It then follows by Proposition 6.3 that every ovoid code is LP universally optimal. 
Proposition 6.3 also shows that the shortened, doubly shortened, and punctured 
versions of every ovoid code are also LP universally optimal. 

7.7. Nordstrom- Robinson code. The Nordstrom- Robinson code [NR67] (see 
[MS77, Ch. 2, §8]) is the unique binary code of type (16,256,6). It is non-linear, 
with support {6, 8, 10, 16} and distance distribution Aq = Aiq = I, Aq = Aiq = 112, 
As = 30. The Nordstrom- Robinson code has the interesting property that its 
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quasicode is self-dual, i.e., a = a, even though the code is non- linear and hence 
not self-dual. 

By explicitly solving the linear program (3.1), one can show that the Nordstrom- 
Robinson code is LP universally optimal (as first shown in Example 2 of [AB99]), 
as are its punctured, shortened, and doubly shortened versions. We do not have 
conceptual proofs of these results. 

8. Removing a codeword from a code 

In this section, wc show that removing a single codeword from an LP universally 
optimal code always yields a universally optimal code. This surprising fact will 
follow from a strengthening of the Delsarte linear program due to Ashikhmin and 
Simonis [AS98]. It is not always true without LP universal optimality: in F2, the 
three-point code {(0, 0), (0, 1), (1, 1)} is universally optimal, but {(0, 0), (0, 1)} is 
not. 

Consider removing a random codeword r from a code C of size A'^ to obtain a 
new code C \ {r}. Let {Aq, Ai,A2, . . . , An) be the distance distribution of C, and 
let (Bq, Bi,B2, . . . , Bn) be the expectation of the distance distribution of C \ {r}. 

Given (x, y) S with a; 7^ y, the probability that neither will be removed is 
{N -1)/N-{N- 2)/{N - 1). Thus, for i > 1, 

1 N-1 N-2 , , , ,1 N-2 , 

Bi = • • \i(x,v) e C : \x - v\ = t\ \ = Aj. 

' N-1 N N-1 '^^ ' ^' Jl N-1 ' 

In particular, the Delsarte inequalities 

n 

hold if and only if 

n 

{N-1)Y, BiKjii) > Kj{0) = {q- ly 

i=a 

Ashikhmin and Simonis proved that these stronger inequalities hold for every 
code whose size is not a multiple of the alphabet size q. In fact, their bound is in 

many cases even stronger (they replace one of the factors of g — 1 with c(q — c), 
where |C| = c (mod q) and < c < g — 1), but wc will not require that refinement. 

Proposition 8.1 (Ashikhmin and Simonis [AS98]). Let C be a code of length n 
over an alphabet of size q and such that q does not divide \C\, and let {Aq, . . . , An) 
be its distance distribution. Then for < j <n, 

n 

\C\J2AiKj(i)>{q-iy 

i=0 

We give a streamlined version of the proof from [AS98]. 

Proof. Fix a total ordering < on F^, chosen arbitrarily. Define a functional f to 
be a sequence (/i, . . . , /„) of maps fi'.Vq — >■ M. For any x € , write f{x) = 
fi{xi)f2{x2)---fn{xn), and write /(C) = J^xecfi^)- ^or any S C {l,...,n}, 
say that / has type S* if = 1 identically whenever i ^ S, and for i G 5, there 
exist aj,6j G Fg such that aj < bi, fi{ai) = 1, fi{bi) = —1, and /j(c) = for all 
c e Fg \ {ai,bi}. 
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First we claim that 

n 

(8.1) \C\Y,AiKj{i}=Yl E /(C)'- 

i=0 |S|=jtype(/)=S 

Here the first sum is taken over all subsets S C {1, . . . , n} of size j and the second 
over all functionals / of type S. The right side is clearly nonnegative, wliicli implies 
the Dclsarte inequalities, and we will analyze it more carefully to get a lower bound 

of 

To prove (8.1), observe that the right side equals 

x,yeC \S\=j typeU)=S 

For fixed x,y € C, with d= \x — y\, there exist (^) ("Z^) choices for S so that exactly 
k of the coordinates where x and y differ fall in S. For each such coordinate c 
there is a unique functional fc of type 5 that makes fc{xc)fc{yc) nonzero, and we 
always have fc{xc)fc{yc) = — 1- For the other coordinates c € S, where Xc = yci 
there are q — 1 choices for fc that make fc{xc)fc{yc) nonzero, and we always have 
fc{xc)fc{yc) = 1- Thus the sum of f{x)f{y) over all functionals / of type S for 
some S with |5| = j is equal to 

Summing over all x,y gC yields (8.1). 

To complete the proof, it suffices to show that for each subset S of {1, . . . ,n} 
of size j, there exist (g — ly functionals / of type S so that /(C) ^ (mod q). 
We show this by induction on j. When j = 0, /(C) = |C| ^ (mod q). Suppose 
the induction hypothesis is true for j — 1. Let 15*1 = j, and i E S. Let / be any 
functional of type S \ {i} with /(C) ^ (mod q). It remains to find g — 1 different 
functionals /' of type S that agree with f on S\ {i} and satisfy /'(C) ^ (mod q). 

For each a € F^, define f as follows: /^ = fk for ^ i, /f (a) = 1, and /f (5) = 
for 6 e \ {a}. Then /(a;) = T,aer^ f i^) all x e F^. Since /(C) ^ (mod q), 
the sums /"(C) cannot all have the same residue mod q, and thus we can find at 
least q - 1 pairs {a, b} C F, such that /"(C) - /''(C) ^ (mod q). Because /" - /'' 
or its negative is a functional of type S (depending on whether a < b), this yields 
the desired g — 1 extensions of /. □ 

This proposition and the even stronger bounds from [AS98] strengthen the 
Delsarte linear program and enable one to prove better energy bounds for codes 
whose size is not divisible by the alphabet size. 

Lemma 8.2. Let f be any potential function. If the Delsarte linear program proves 
that a code C C F^ minimizes f -potential energy, then either \C\ is a multiple of q 
or f takes on its lowest value at all the distances between pairs of distinct codewords 
in C. 

Proof. Suppose |C| is not a multiple of q. Then Proposition 8.1 shows that the dual 
distribution of C is strictly positive, and thus the auxiliary function in Proposition 3.4 
must be a constant function. In that case the conclusion follows, because the auxiliary 
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function is less than or equal to / everywhere and equal at the distances between 
codewords. □ 

Corollary 8.3. Every LP universally optimal code in has size a multiple of q 
unless all pairs of distinct points in the code are at distance n. 

In the latter case, there can be at most q points in the code. 

Now we prove Theorem 1.3 and the remarks following it. Namely, we show that 
if the Delsarte linear program proves that a code minimizes /-potential energy, then 
the code with any single codeword removed also minimizes /-potential energy. 

Proposition 8.4. Let C C ¥^ be a code and let f: {l,2,...,n} M. be any 
function (not necessarily completely monotonic) such that the distance distribution 
of C minimizes f -potential energy among all quasicodes of the same length and 
size (i.e., C and f satisfy the hypotheses of Proposition 3.4)- Let x G C. Then 
Ef{C \ {x}) < Ef{C') for every code C C with |C'| = |C| - 1. 

Proof. By Lemma 8.2 we may assume that \C\ is a multiple of g, because the other 
case in the lemma is trivial. 

Let N = |C|, let {1, Ai, . . . , An) be the distance distribution of C, and let 
{1, Bi, . . . , Bn) he the expected distance distribution after removing a random 
codeword. As explained at the beginning of the section, 

Bi = Ai 

" N-1 ' 

for i > 0, and the Delsarte inequalities 

n 

J2AiKj{i)>0 
are equivalent to the Ashikhmin-Simonis inequalities 

{N-l)j2BiKjit)>{q-iy(fj. 

i=0 ^■'^ 

It follows that minimality of the /-potential energy of C subject to the Delsarte 

inequalities is equivalent to minimality of the expected energy subject to the 
Ashikhmin-Simonis inequalities. Note that we can apply Proposition 8.1 to codes of 
size N — 1, because A'' is a multiple of q and hence A' — 1 is not. 

Thus, no code of size A — 1 in F" can have lower /-potential energy than the 
expectation of removing a random codeword from C. Removing different codewords 
might yield non-isomorphic codes. However, by hnearity of expectation they must all 
have the same energy, since none of them can have lower energy than the average. It 
follows that C \ {x} minimizes /-potential energy among all codes of size |C| — 1. □ 

In particular, by letting / vary over all completely monotonic functions, we see 
that if C is LP universally optimal, then C \ {x} is universally optimal for all x gC. 
All of these codes C \ {x} must have the same distance distribution, since they have 
the same energy for all completely monotonic potential functions, which span the 
space of all potential functions. Thus C must be distance regular. 

This completes the proof of Theorem 1.3. We find the result quite surprising, 
and the role of the Ashikhmin-Simonis inequalities in the proof is mysterious. It is 
natural to look for other proofs of these inequalities. There is a much simpler proof 
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for binary codes (Theorem 5 in [BBMOS78]), which we have been able to generahze 
to alphabets of prime power order but no further. The elegant proof of the Dclsarte 
inequalities in [SV91] can be adapted to give a proof of the Ashikhmin-Simonis 
inequalities as well, but in fact there is an error in [SV91]: equation (13") is incorrect 
and the map a is not well defined for a general alphabet. When the alphabet has 
prime power order, the proof works, but we see no way to salvage it in general. 
Of course, restricting to prime power order is not so bad, because we know of no 
examples of LP universal optima over other alphabets beyond the trivial examples 
with all distances maximal, but they may exist. 

One consequence of Theorem 1.3 is that the distance distribution of an LP 
universally optimal code must be integral, which puts strong constraints on when a 
universally optimal quasicode can be realized by an actual code. For example, 

(1,0,7/2,9/2,1/2,1/2) 

is the universally optimal binary quasicode of size 10 and length 5. Because it is 
not integral, there exists no LP universally optimal code of size 10 in Fj. 

9. Further questions and generalizations 

We conclude with some open questions and generalizations that arose during this 
investigation. 

For large block lengths, how many universal optima are there? We expect that 
they must be rare, but we do not know how to rule out the possibility that there 
exist universal optima of almost all block lengths and sizes. 

In the study of universally optimal spherical codes by Cohn and Kumar [CK07] , 
it was shown that every m-distance code that is a (2m — l)-design is universally 
optimal [CKG7, Theorem 1.2], in analogy with results of Levenshtcin about codes 
and packings [L92, L95]. This condition is similar to the hypotheses of Proposi- 
tion G.2 with part (b) dropped. Wo suspect that condition (b) in Proposition 6.2 is 
unnecessary (perhaps with some additional mild hypotheses added), but we do not 
know how to prove this in general. We worked around this issue in Proposition 6.3 
by adding the additional hypothesis that at most one element in the pair covering is 
less than {q — l)n/q. In the special case of 1-designs, we saw in Proposition 6.5 that 
condition (b) can be omitted. These results suffice for all the examples we are aware 
of, so the question of whether condition (b) is necessary is primarily of theoretical 
interest, but it would be pleasant to state as simple a general theorem as possible. 

In the introduction we mentioned that universally optimal codes of length 5 and 
size N exist if and only if ^ {9, 12, 13, 14, 18, 19, 20, 23}, and such a code is unique 
except when iV = 5 or A = 27, in which case there are two isomorphism classes. 
There is a simple explanation for the symmetry A -H- 2"'' — A. The unoccupied 
locations in a code can be viewed as antiparticles, which are subject to exactly the 
same forces as the original particles: 

Lemma 9.1. IfC C and f is any potential function, then the complementary 
code F^ \ C satisfies 

{q- - \C\) £/(F^ \ C) = |C| Ef{C) + (g" - 2|C|) Q (g - lff{k). 

Thus, Fg \ C is universally optimal if and only if C is. 
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See also Section 1.3.4 of [K07] for an essentially equivalent lemma. 
Proof. By inclusion-exclusion, we have 

J2 fi\x-y\)= J2 f(\^-y\)-^ E E 

x,y^C x,2/eF^ xGF^.yeC x.yeC 

x^y x^y x^y x^y 

= (9" - 2|C|) E (fj il - + E / (1^ - 2^1) • 

k=l ^ ^ x,yeC 

□ 

Linear programming bounds do not respect the antiparticle symmetry from 
Lemma 9.1. For example, in there is an LP universally optimal code of size 2 
but not one of size 6. This asymmetry can be used to strengthen the bounds for 
codes containing more than half of the points in the ambient space. Specifically, 
consider a code in with distance distribution a and size N. Let b be the distance 
distribution of its complement, and let c be that of the entire space F^. Then 
Lemma 9.1 amounts to 

N - 2iV 
b = a H c. 

- N q"^ - N 

The vector b is a convex combination of a and c when N < q"'/2, and in that case 
the Delsarte inequalities for b follow from those for a and c, so they provide no 
additional information. However, when > g"/2, switching the roles of a and b 
above shows that the inequalities on b are at least as strong as those on a, and 
they may be stronger. Thus, when applying linear programming bounds, one should 
always pass to the complement when N > q"'/2. Of course, not many important 
codes contain a majority of the ambient space. 

With the exception of a few small cases, all of our proofs of universal optimality 
are based on linear programming bounds, if we take into account the antiparticle 
symmetry and the extensions in Section 8. We would like to see a nontrivial 
example that cannot be proved using these techniques. One possible candidate is the 
(13, 32, 6) Nadler code [MS77, p. 538], which is known to be optimal in the sense that 
there is no code of size greater than 32 in F2^ with minimum distance at least 6. The 
proof of optimality [BBMOS78] involves adding further constraints to the Delsarte 
linear program. Specifically, by puncturing and then extending the code one can 
assume that all Hamming distances between codewords are even; furthermore, a 
simple combinatorial argument gives another linear constraint involving distances 10 
and 12. We suspect that the Nadler code is universally optimal. One can check that 
its distance distribution does not optimize the linear program (3.1), but it does if 
we add the same constraints as above. However, when proving universal optimality, 
we do not know how to apply the same simplifying assumptions used in the proof of 
optimality. The universal optimality of the Nadler code remains an open question. 

Beyond linear programming bounds, are there more systematic or principled 
techniques that could be applied? Semidefinite programming bounds [S05, GST06] 
are the most powerful approach known to proving coding theory bounds. They have 
been applied to potential energy minimization and used to prove universal optimality 
for a configuration in projective space [CW12], but we have not investigated them 
in the setting of this paper. 



ENERGY-MINIMIZING ERROR-CORRECTING CODES 



31 



Many of our results generalize straightforwardly to the setting of metric and co- 
metric association schemes, i.e., distance-regular graphs under the graph metric with 
the "Q-polynomial" property, which yields replacements for the Krawtchouk polyno- 
mials [BCN89, DL98, MT09]. There are several noteworthy omissions, namely the 
theory of duality (including the definition of the dual quasicode and Proposition 1.2) 
and the results of Section 8. However, the results of Sections 5 and 6 all generalize if 
{q — l)n/q is replaced with the average distance between a pair of randomly selected 
points in the graph, with the exception of Lemma 5.6 (which is needed only for 
MDS codes) and Proposition 6.6. The proofs are essentially identical. 

We have not attempted to compile a careful list of examples for this more general 
theory, along the lines of Table 1, but there are several interesting applications. For 
example, consider the Johnson space of binary vectors of length n and weight w. 
Every projective plane of order q yields an S{2,q+ + q + 1) Steiner system 
and thus a configuration oi q'^ + q + 1 points in the Johnson space with parameters 
(n, w) = {q^ + q + l,q + 1). This configuration is a simplex and a 2-design, so it is 
universally optimal. For a somewhat deeper example, the S{5, 8, 24) Steiner system 
is also LP universally optimal. 

The role of duality in association scheme theory is well understood (see Section 2.6 
in [D73]), and it does not generalize to arbitrary metric association schemes. However, 
the results of Section 8 are far more mysterious, and we have no idea how far 
they might generalize. In particular, we have no conceptual explanation for why 
Proposition 8.1 turns out to be exactly what we require to analyze removing one 
point from an LP universal optimum. Any progress on generalizing either the results 
or the proof techniques to other association schemes would be exciting. 

Finally, we believe that there is still far more to say about analogies between coding 
theory and physics, with numerous cases in which parallel work has unknowingly 
been done in these areas. For example, the study in [SOI] of the Delsarte bounds for 
codes is perfectly analogous to the physics-inspired work in [TS06] on realizability 
of pair correlation functions based on the nonnegativity of the structure factor. We 
hope that developing a common framework will lead to advances in both fields. 

Acknowledgments 
We thank Alexander Barg for providing valuable feedback and suggestions. 

References 

[AB99] A. Asliikliiiiiu and A. Barg, Binomial moments of the distance distribution: bounds 

and applications, IEEE Trans. Inform. Theory 45 (1999), 438-452. MR1677009 
doiilO. 1109/18.748994 

[ABLOl] A. Ashikhmin, A. Barg, and S. Litsyn, Estimates of the distance distribution of 
codes and designs, IEEE Trans. Inform. Theory 47 (2001), 1050-1061. MR1829331 
doi:10.1109/18.915662 

[AS98] A. Ashikhmin, and J. Simonis, On the Delsarte inequalities, Linear Algebra Appl. 

269 (1998), 197-217. MR1483528 doi:10.1016/S0024-3795(97)00065-7 
[BBMOS78] M. R. Best, A. E. Brouwcr, F. J. MacWilliams, A. M. Odlyzko, and N. J. A. Sloanc, 

Bounds for binary codes of length less than 25, IEEE Trans. Inform. Theory IT-24 

(1978), 81-93. MR0479645 doi:10.1109/TIT.1978. 1055827 
[BDL12] N. Bouman, J. Draisma, J. van Leeuwaarden, Energy minimisation of repelling 

particles on a toric grid, preprint, 2012. arXiv: 1203.0408 
[BCN89] A. E. Brouwer, A. M. Cohen, and A. Neumaier, Distance-regular graphs, Ergebnisse der 

Mathematik und ihrer Grenzgebiete (3) 18, Springer- Verlag, BerUn, 1989. MR1002568 



32 

[CIO] 

[CK07] 

[CW12] 

[CS99] 

[D72] 
[D73] 
[DL98] 

[FC03] 

[GST06] 

[H50] 

[H08] 

[H78] 

[HJ13] 

[K07] 

[L92] 
[L95] 

[vL99] 
[MS77] 

[.\1T09] 
[M04] 

[MU07] 

[NR67] 
[0'K96] 



HENRY COHN AND YUFEI ZHAO 



H. Cohn, Order and disorder in energy minimization, Proceedings of the International 

Congress of Matliematicians. Volume IV, 2416-2443, Hindustan Book Agency, New 

Delhi, 2010. MR2827978 doiilO. 1142/9789814324359-0152 arXiv:1003.3053 

H. Cohn and A. Kumar, Universally optimal distribution of points on spheres, J. 

Amer. Math. Soc. 20 (2007), 99-148. MR2257398 doi;10.1090/S0894-0347-06-00546-7 

arXiv:math/0607446 

H. Cohn and J. Woo, Three-point bounds for energy minimization, J. Amer. 
Math. Soc. 25 (2012), 929-958. MR2947943 doi:10.1090/S0894-0347-2012-00737-l 
arXiv:1103.0485 

J. H. Conway and N. J. A. Sloane, Sphere packings, lattices and groups, third edition, 
Grundlehrcii der Mathematischen Wissenschaften 290, Springer- Verlag, New York, 
1999. MR1662447 

P. Dclsarte, Bounds for unrestricted codes, by linear programming. Philips Res. Rep. 
27 (1972), 272-289. MR0314545 

P. Delsarte, An algebraic approach to the association schemes of coding theory. Philips 
Res. Rep. Suppl. 10 (1973), vi+97 pp. MR0384310 

P. Delsajrte and V. I. Levenshtein, Association schemes and coding theory, IEEE 
Trans. Inform. Theory 44 (1998), 2477-2504. MR1658771 doi:10.1109/18.720545 
G. Ferrari and K. M. Chugg, Linear programming-based optimization of the distance 
spectrum of linear block codes, IEEE Trans. Inform. Theory 49 (2003), 1794-1800. 
MR1985580 doi;10.1109/TIT.2003.813483 

D. Gijswijt, A. Schrijver, and H. Tanaka, New upper bounds for nonbinary codes 
based on tfie Terwilliger algebra and semidefinite programming, J. Combin. Theory 
Sei. A 113 (2006), 1719 1731. MR2269550 doi:10.1016/j.jcta.2006.03.010 
R. W. Hamming, Error detecting and error correcting codes. Bell System Tech. J. 29 
(1950), 147-160. MR0035935 

O. Heden, A survey of perfect codes, Adv. Math. Commun. 2 (2008), 223-247. 
MR2403049 doi:10.3934/amc.2008.2.223 

R. Hill, Caps and codes, Discrete Math. 22 (1978), 111-137. MR523299 
doi:10.1016/0012-365X(78)90120-6 

R. A. Horn and C. R. Johnson, Matrix analysis, second edition, Cambridge University 
Press, 2013. 

T. Kl0ve, Codes for error detection. Series on Coding Theory and Cryptology 
2, World Scientific Pubhshing Co. Pte. Ltd., Hackensack, NJ, 2007. MR2351823 
doi:10.1142/9789812770516 

V. I. Levenshtein, Designs as maximum codes in polynomial metric spaces. Acta 
Appl. Math. 29 (1992), 1-82. MR1192833 doi:10.1007/BF00053379 

V. I. Levenshtein, Krawtehouk polynomials and universal bounds for codes and designs 
in Hamming spaces, IEEE Trans. Inform. Theory 41 (1995), 1303-1321. MR1366326 
doiilO. 1109/18.412678 

J. H. van Lint, Introduction to coding theory, third edition, Graduate Texts in 
Mathematics 86, Springer- Verlag, Berlin, 1999. MR1664228 

F. J. MacWilliams and N. J. A. Sloane, The theory of error- correcting codes, North- 
Holland Mathematical Library 16, North- Holland Publishing Co., Amsterdam, 1977. 
MR0465509 

W. J. Martin and H. Tanaka, Commutative association schemes, European J. Combin. 
30 (2009), 1497-1525. MR2535398 doi:10.1016/j.cjc.2008.11.001 arXiv:0811.2475 
R. J. McEliecc, The theory of information and coding, student edition. Encyclopedia 
of Mathematics and its Applications 86, Cambridge University Press, Cambridge, 
2004. MR2136604 

A. Montanari and R. Lrbanke, Modern coding theory: the statistical mechanics and 

computer science point of view, in J.-P. Bouchard, M. Mezard, and J. Dalibard 

(eds.), Complex systems, Les Houches, Session LXXXV (2006), pp. 67-130, Elsevier, 

Amsterdam, 2007. doi:10.1016/S0924-8099(07)80009-6 arXiv:0704.2857 

A. W. Nordstrom and J. P. Robinson, An optimum nonlinear code. Information and 

Control 11 (1967), 613-616. doi:10.1016/S0019-9958(67)90835-2 

C. M. O'Keefe, Ovoids in PG(3,g); a survey, Discrete Math. 151 (1996), 175-188. 

MR1391265 doi: 10. 1016/0012-365X(94)00095-Z 



ENERGY-MINIMIZING ERROR-CORRECTING CODES 



33 



[SOI] A. Samorodnitsky, On the optimum of Dels arte' s linear program, J. Combin. Theory 

Ser. A 96 (2001), 261-287. MR1864123 doi;10.1006/jcta.2001.3176 
[805] A. Schrijvcr, New code upper bounds from the Terwilliger algebra and semidefi- 

nite programming, IEEE Trans. Inform. Theory 51 (2005), 2859-2866. MR2236252 

doi:10.1109/TIT.2005.851748 
[SV91] J. Simonis and C. de Vroedt, A simple proof of the Delsarte inequalities, Des. Codes 

Cryptogr. 1 (1991), 77-82. MR1110015 doi:10.1007/BF00123961 
[S89] N. Sourlas, Spin-glass models as error- correcting codes, Nature 339 (1989), 693-695. 

doi:10.1038/339693a0 

[S94] N. Sourlas, Spin glasses, error- correcting codes and finite-temperature decoding, 

Europhys. Lett. 25 (1994), 159-164. MR1261274 doi:10.1209/0295-5075/25/3/001 
[T92] J. A. Thas, M.D.S. codes and arcs in projective spaces: a survey, Matematiche 

(Catania) 47 (1992), 315-328. MR1275863 
[TS06] S. Torquato and F. H. Stillinger, New conjectural lower bounds on the optimal 

density of sphere packings, Experiment. Math. 15 (2006), 307-331. MR2264469 

doi:10.1080/10586458.2006.10128964 arXiv:math/0508381 
[Y92] V. A. Yudin, The minimum of potential energy of a system of point charges (Russian) , 

Diskret. Mat. 4 (1992), 115-121; translation in Discrete Math. Appl. 3 (1993), 75-81. 

MR1181534 doi:10.1515/dma.l993.3.1.75 

Microsoft Research New England, One Memorial Drive, Cambridge, MA 02142 
E-mail address: cohnSmicrosof t . com 

Department op Mathematics, Massachusetts Institute of Technology, Cambridge, 
MA 02139 

E-mail address: yuf eizSmit . edu 



